When Naïve is not Enough: Bringing Naïve Bayes Text Categorization to "Surface"
نویسنده
چکیده
Since information has become more and more available in digital format, especially on the World Wide Web, organizing and classifying digital documents, making them accessible and presenting them in a proper way are becoming important issues. Digital Library Management Systems (DLMSs) are an example of systems that manage collections of multi-media digitalized data and include components that perform the storage, access, retrieval, and analysis of the collections of data. Solutions for content organization, access and interaction are required in order to let users express their information needs, especially when the specific request is not clear in their mind [2]. Automatic categorization systems may be used to build classification schemes (or taxonomies, or subject hierarchies), in order to browse, explore, and retrieve resources from collections of digital objects. Visualization frameworks that give straightforward graphical explanation of organization and classification of data have been successfully designed and implemented [3]. However, this visualization approaches have been rarely, if never, applied in the area of textual document categorization. However, these visualization approaches have been rarely applied in the area of textual document categorization. In fact, the representation of textual information is particularly challenging: how can the semantics of textual documents be captured and represented through graphs? A recent probabilistic approach on Automated Text Categorization (ATC) that represent documents on the two– dimensional space [1, 4] has shown to be a valid visualization tool to understand the relationships between categories of textual documents, and help users to visually audit the classifier and identify suspicious training data. In this work, we apply the same idea of the twodimensional representation of documents to the case of the Naı̈ve Bayes (NB) classifier. 2 Naı̈ve Bayes Categorization
منابع مشابه
Comparing SVM and Naive Bayes classifiers for text categorization with Wikitology as knowledge enrichment
The activity of labeling of documents according to their content is known as text categorization. Many experiments have been carried out to enhance text categorization by adding background knowledge to the document using knowledge repositories like Word Net, Open Project Directory (OPD), Wikipedia and Wikitology. In our previous work, we have carried out intensive experiments by extracting know...
متن کاملComparison of Decision Tree and Naïve Bayes Methods in Classification of Researcher’s Cognitive Styles in Academic Environment
In today world of internet, it is important to feedback the users based on what they demand. Moreover, one of the important tasks in data mining is classification. Today, there are several classification techniques in order to solve the classification problems like Genetic Algorithm, Decision Tree, Bayesian and others. In this article, it is attempted to classify researchers to “Expert” and “No...
متن کاملText Categorization using Association Rule and Naive Bayes Classifier
As the amount of online text increases, the demand for text categorization to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Automatic categorization of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained fro...
متن کاملArabic Text Categorization
In this paper, we compare the performance of three classifiers for Arabic text categorization. In particular, the naïve Bayes, k-nearest-neighbors (knn), and distance-based classifiers were used. Unclassified documents were preprocessed by removing punctuation marks and stopwords. Each document is then represented as a vector of words (or of words and their frequencies as in the case of the naï...
متن کاملComparison of Decision Tree and Naïve Bayes Methods in Classification of Researcher’s Cognitive Styles in Academic Environment
In today world of internet, it is important to feedback the users based on what they demand. Moreover, one of the important tasks in data mining is classification. Today, there are several classification techniques in order to solve the classification problems like Genetic Algorithm, Decision Tree, Bayesian and others. In this article, it is attempted to classify researchers to “Expert” and “No...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007